Detection and Tracking of Moving Objects

ABSTRACT

Techniques for performing visual surveillance of one or more moving objects are provided. The techniques include registering one or more images captured by one or more cameras, wherein registering the one or more images comprises region-based registration of the one or more images in two or more adjacent frames, performing motion segmentation of the one or more images to detect one or more moving objects and one or more background regions in the one or more images, and tracking the one or more moving objects to facilitate visual surveillance of the one or more moving objects.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.12/972,836, filed Dec. 20, 2010, incorporated by reference herein.

FIELD OF THE INVENTION

Embodiments of the invention generally relate to information technology,and, more particularly, to object detection.

BACKGROUND OF THE INVENTION

In recent years, reconnaissance, surveillance, disaster relief, searchand rescue, agriculture information gathering and fast remote sensingmapping has gained increasingly attentions in civilian and militarypurposes. For example, due to their small size and low-cost sensorplatform, Unmanned Aerial Vehicle (UAV) can be an attractive platformfor executing such operations. However, UAV introduces some significantchallenges when used in surveillance systems. For an instance, thebackground significantly changes as the camera has a fast motion and anirregular rotation, and the motion of a UAV vehicle is usually notsmooth. Further, frame rate is very low (for example, 1 frame persecond) so as to increase the difficulties of detecting and trackingground moving targets, and small object size will bring anotherchallenge for object detection and tracking. Also, a camera's strongillumination change and stripe noise can create some hard problems toseparate true moving objects from the background.

Existing approaches also include object initialization issues, and areadditionally unable to obtain high-accuracy registration results, tohandle rotation and scale variation of a target, and to deal withsimilar distribution between target and background.

SUMMARY OF THE INVENTION

Principles and embodiments of the invention provide techniques fordetection and tracking of moving objects. An exemplary method (which maybe computer-implemented) for performing visual surveillance of one ormore moving objects, according to one aspect of the invention, caninclude steps of registering one or more images captured by one or morecameras, wherein registering the one or more images comprisesregion-based registration of the one or more images in two or moreadjacent frames, performing motion segmentation of the one or moreimages to detect one or more moving objects and one or more backgroundregions in the one or more images, and tracking the one or more movingobjects to facilitate visual surveillance of the one or more movingobjects.

One or more embodiments of the invention or elements thereof can beimplemented in the form of a computer product including a tangiblecomputer readable storage medium with computer useable program code forperforming the method steps indicated. Furthermore, one or moreembodiments of the invention or elements thereof can be implemented inthe form of an apparatus including a memory and at least one processorthat is coupled to the memory and operative to perform exemplary methodsteps. Yet further, in another aspect, one or more embodiments of theinvention or elements thereof can be implemented in the form of meansfor carrying out one or more of the method steps described herein; themeans can include (i) hardware module(s), (ii) software module(s), or(iii) a combination of hardware and software modules; any of (i)-(iii)implement the specific techniques set forth herein, and the softwaremodules are stored in a tangible computer-readable storage medium (ormultiple such media).

These and other objects, features and advantages of the presentinvention will become apparent from the following detailed descriptionof illustrative embodiments thereof, which is to be read in connectionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating sub-pixel position estimation,according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating sub-region selection, according to anembodiment of the present invention;

FIG. 3 is a diagram illustrating forward and backward geometricregistration, according to an embodiment of the present invention;

FIG. 4 is a flow diagram illustrating forward and backward framedifferencing, according to an embodiment of the present invention;

FIG. 5 is a flow diagram illustrating false blob filtering, according toan embodiment of the present invention;

FIG. 6 is a flow diagram illustrating multi-object tracking, accordingto an embodiment of the present invention;

FIG. 7 is a diagram illustrating reference plane-based registration andtracking, according to an embodiment of the present invention;

FIG. 8 is a flow diagram illustrating automatic urban road extraction,according to an embodiment of the present invention;

FIG. 9 is a block diagram illustrating architecture of an objectdetection and tracking system, according to an aspect of the invention;

FIG. 10 is a flow diagram illustrating techniques for performing visualsurveillance of one or more moving objects, according to an embodimentof the invention; and

FIG. 11 is a system diagram of an exemplary computer system on which atleast one embodiment of the invention can be implemented.

DETAILED DESCRIPTION OF EMBODIMENTS

Principles of the invention include detection, tracking, and searchingof moving objects in visual surveillance. In an example settingincluding moving objects and one or more moving cameras, one or moreembodiments of the invention include motion segmentation (motion blobsversus background region), multiple object tracking (for example,consistently tracking in over-time) and reference plane-basedregistration and tracking. As detailed herein, one or more embodimentsof the invention include using multiple cameras (for example, registeredwith each other) mounted, for example, on mobile platforms (for example,unmanned aerial vehicle (UAV) videos) to detect, track and search formoving objects by forming a panoramic view from the images received fromthe cameras based on global/local geometric registration, motionsegmentation, moving object tracking, reference plane-based registrationand tracking and automatic urban road extraction.

The techniques described herein include recursive geometricregistration, which includes region-based image registration foradjacent frames instead of for an entire frame, sub-pixel image matchingtechniques, and region-based geometric transformation for handling lensgeometric distortion. Also, one or more embodiments of the inventioninclude two-way motion detection and hybrid target tracking using colorsand features. Two-way motion detection includes forward and backwardframe differencing, automatic dynamic threshold estimation based ontemporary and/or spatial filtering, as well as false moving pixelremoval based on independent motions of features. Hybrid target trackingincludes Kanade-Lucas-Tomasi feature tracker (KLT) and meanshift, autokernel scale estimation and updating, and consistently tracking inover-time using coherent motion of feature trajectories.

Further, the techniques detailed herein include multi-target trackingalgorithms based on feature matching and distance matrices for smalltargets, as well as, for example, a UAV surveillance systemimplementation with Low frame rate (1 f/s) for detecting and trackingthe targets with small size (for example, without any known shapemodel).

As noted herein, one or more embodiments of the invention includelocal/global geometric registration of videos (for example, UAV videos).In order to reduce the camera motion effect, a frame-to-frame videoregistration process is implemented. An accurate way to register twoimages can include matching every pixel in each image. However, the highcomputation is not feasible. An efficient way is to find a relativelysmall set of feature points in the image that will be easy to find againand use only those points to estimate a frame-to-frame homography. Byway of example only, 500-600 feature points can be extracted for animage of 1280×1280 pixels.

Harris corner detector can be applied to image registration and motiondetection due to its invariance to scale, rotation and illuminationvariation. In one or more embodiments of the invention, Harris cornerdetector can be used as a feature point detector. Its algorithm can bedescribed as follows:

1. For a pixel in an image I, compute its x- and y-directionalderivatives I x and I y, and I xy=I x I y.

2. Apply a window function A, that is, hx=AI x, hy=AI y, hxy=AI xy.

3. Compute H=h_(x)h_(y)−h_(xy) ²−κ(h_(x)+h_(y))² (κ is a constant) tomeasure variations in both directions.

4. Threshold H and find local maxima to obtain a corner.

To compare the windows, one or more embodiments of the invention includeusing a normalized correlation coefficient, which is an efficientstatistical method. The actual feature matching is achieved bymaximizing the correlation coefficient over small windows surroundingthe points. The correlation coefficient is given by:

$\begin{matrix}{{\rho = \frac{\sum\limits_{r = 1}^{R}{\sum\limits_{c = 1}^{C}{\left\lbrack {{g_{1}\left( {r,c} \right)} - u_{1}} \right\rbrack \cdot \left\lbrack {{g_{2}\left( {r,c} \right)} - u_{2}} \right\rbrack}}}{\sqrt{\sum\limits_{r = 1}^{R}{\sum\limits_{c = 1}^{C}{\left\lbrack {{g_{1}\left( {r,c} \right)} - u_{1}} \right\rbrack^{2}{\sum\limits_{r = 1}^{R}{\sum\limits_{c = 1}^{C}\left\lbrack {{g_{2}\left( {r,c} \right)} - u_{2}} \right\rbrack^{2}}}}}}}};{{- 1} \leq \rho \leq 1}} & (1)\end{matrix}$

where:g₁(r,c) represents individual gray values of template matrix;u₁ represents average gray value of template matrix;g₂(r,c) represents individual gray values of corresponding part ofsearch matrix;u₂ represents average gray value of corresponding part of search matrix;andR, C represents number of rows and columns of template matrix.

Therefore, the block matching process can be achieved as follows. Foreach point in a reference frame, all points in the chosen frame areexamined and its most similar point is chosen. Next, it is testedwhether the achieved correlation is reasonably high. The point withmaxima correlation coefficient is taken as a candidate point.

Video registration requires real-time implementation. In one or moreembodiments of the invention, the block-matching algorithm is onlyimplemented for the features. As such, the computational expense can besignificantly reduced.

One or more embodiments of the invention also include correspondingfeatures checking and outlier removal. Feature-based block matching cansometimes cause a mismatch. To avoid a mismatching problem, one or moreembodiments of the invention include using forward searching to processthe mismatching data which cases are one too many, keeping the candidatecorresponding feature with the maximum gradient value and removing theothers. Also, backward searching is employed to solve the remainingmismatching problem using the same approach.

In many instances, a pair of features with similar attributes isaccepted as a match. Nevertheless, some false matches may occur.Therefore, in one or more embodiments of the invention, a random sampleconsensus (RANSAC) outlier removal procedure is performed to removeincorrect matches and improve the registration precision.

The techniques detailed herein can additionally include coarse-to-finefeature matching. Multi-resolution feature matching can reduce searchingspace and false matching. At a coarsest resolution layer, featurematching is performed and the searching scope is determined. At thecurrent resolution layer, the matching results at the last layer can betaken as initial results and the matching process can be performed byusing equation (1) noted above. In one or more embodiments of theinvention, a search scope is limited to 1-3 pixel(s). Further, the sameoperation can be repeated until the highest resolution layer is reached.

As additionally described herein, one or more embodiments of theinvention include accurate position determination. For videoregistration and motion detection purposes, pixel level accuracy may notenough. In such instances, a sub-pixel position approach is considered,and a distance-based weighting interpolation is determined to the peak.The horizontal and vertical locations of the peak can be separatelyestimated for the feature. Also, the one-dimensional horizontal andvertical correlation curves can be obtained. Further, the correlationvalue in x,y directions is interpolated separately, and the accuratelocation of the peak is computed. By way of example, FIG. 1 is a diagramillustrating sub-pixel position estimation, according to an embodimentof the present invention.

The techniques described herein also include local geometricregistration. By way of example, a sub-region geometric registration canbe selected, and the entire frame can be divided into 2×2 sub-regions.FIG. 2 illustrates two selection models.

FIG. 2 is a diagram illustrating sub-region selection, according to anembodiment of the present invention. By way of illustration, FIG. 2depicts sub-region selection model 202 and sub-region selection model204.

One or more embodiments of the invention also include an affine-basedlocal transformation, such as, for example, the following:

$\begin{bmatrix}x \\y\end{bmatrix} = \begin{bmatrix}{a_{0} + {a_{1}u} + {a_{2}v}} \\{b_{0} + {b_{1}u} + {b_{2}v}}\end{bmatrix}$

Where (x, y) is the new transformed coordinate of (u, v), and (a_(j),b_(k)) (j, k=1, 2, 3) is the set of transformation parameters. Further,to determine the local transformation parameters for each sub-region,one or more embodiments of the invention include using a least squarestechnique to compute the transformation parameters.

One or more embodiments of the invention also include forward/backwardframe-to-frame registration. For example, with instances of rapid cameramotion, strong illumination variation and heavy stripe noise, to avoidresidual error propagation, forward/backward frame-to-frame registrationis carried out for multi-frame differencing. FIG. 3 illustrates anapproach.

FIG. 3 is a diagram illustrating forward and backward geometricregistration, according to an embodiment of the present invention. Byway of illustration, FIG. 3 depicts frame 302 (F_(i−1)), frame 304(F_(i)) and frame 306 (F₁₊₁). To estimate object motion at frame 304(F_(i)), which is taken as a reference frame, previous frame 302(F_(i−1)) and next frame 306 (F_(i+1)) are geometrically registered tothe reference frame. Motion estimation for each frame is carried out insuch a fashion.

Forward/backward frame differencing can also be implemented for motiondetection. A diagram of the approach used in one or more embodiments ofthe invention is illustrated in FIG. 4. FIG. 4 is a flow diagramillustrating forward and backward frame differencing, according to anembodiment of the present invention. After forward/backwardframe-to-frame images (for example, frame 402, frame 404 and frame 406)are geometrically registered and aligned in steps 408 and 410,difference images are calculated. Instead of using simple subtractionbetween the aligned frames, one or more embodiments of the invention useforward/backward frame differencing in steps 412 and 414 to reducemotion noise and compensate the illustration variation such as automaticgain control.

Additionally, step 416 includes performing image arithmetic viaI_(new)=ΔI_(i−1,i) AND Δ_(i,i+1). Step 418 includes median filtering,which can reduce random motion noise. To extract moving pixels of objectmoving objects, automatic dynamic threshold estimation based on spatialfiltering in step 420 is carried out. Further, step 422 includesperforming a morphological operation to remove small isolated spots andfill holes in foreground image and step 424 includes generating motionpixels (for example, a motion map).

To further reduce random noise and illumination variation effect,logical AND operation is implemented for forward/backward differenceimages to get a final difference image.

$\left\{ {{{{\begin{matrix}{{{D_{{i - 1},i}\left( {x,y} \right)} = {{{F_{i - 1}\left( {x,y} \right)} - {F_{i}\left( {x,y} \right)}}}};} \\{{{D_{i,{i + 1}}\left( {x,y} \right)} = {{{F_{i}\left( {x,y} \right)} - {F_{i + 1}\left( {x,y} \right)}}}};}\end{matrix}{D_{i}\left( {x,y} \right)}} = {{D_{{i - 1},i}\left( {x,y} \right)}\bigcap{D_{i,{i + 1}}\left( {x,y} \right)}}};{i = 1}},2,\ldots \mspace{14mu},N} \right.$

A threshold for each pixel is statistically calculated automatically interms of statistical characteristics and spatial high frequency data ofdifference image. Further, a morphology step can be applied to removesmall isolated spots and fill holes in the foreground image.

As described herein, one or more embodiments of the invention alsoinclude motion verification. FIG. 5 is a flow diagram illustrating falseblob filtering, according to an embodiment of the present invention.Step 502 includes generating a motion map. Step 504 includes applying aconnected component process to link each blob data. Step 506 includescreating a motion blob table. Step 508 includes performing an opticalflow estimation. Step 510 includes making a displacement determination.If there is displacement, the process proceeds to step 512, whichincludes performing post-processing such as, for example, dataassociation, object tracking, trajectory maintenance and track datamanagement. If there is no displacement, the process proceeds to step514, which includes filtering false blobs.

Accordingly, after a blob table is created, in order to remove falsemotion blobs from the blob table, each blob data is verified. One ormore embodiments of the invention apply a KLT process to estimate themotion of each blob after forward/backward frame-to-frame registrationis done. A false blob will be deleted from the blob table. The processsteps can include, for example, applying a connected component processto link each blob data, creating a blob table, extracting features foreach blob in a previous registered frame, applying the KLT method toestimate the motion of each blob, and if no motion occurs, the blob isdeleted from the blob table. Also, the above-noted steps can be repeatedfor all blobs.

As also detailed herein, one or more embodiments of the inventioninclude multi-object tracking. FIG. 6 is a flow diagram illustratingmulti-object tracking, according to an embodiment of the presentinvention. Step 602 includes generating a motion map. Step 604 includesidentifying moving blobs. Step 606 includes object initialization andstep 608 includes object checking. Step 610 includes identifying objectregions. Step 612 includes identifying candidate regions. Also, step 614includes meanshift tracking and step 616 includes identifying newlocations.

Additionally, after identifying object regions in step 610, features canbe extracted in step 618. Once a search region is set in step 620,moving blobs can be found as potential object candidates in step 622.KLT matching is performed in step 624 and outlier removal based on anaffine transform with RANSAC is performed in step 626. A new regioncandidate is identified in step 628. Meanwhile, Meanshift is applied instep 614 to compute the inter-frame translation. This yields a candidateregion location in step 616. From steps 628 and 616, the process canproceed to step 630, which determines the final region location based onthe Bhattacharyya coefficient. Also, step 632 includes target modelupdating for solving drift issues, and step 634 includes trajectoryupdating. Also, to track moving objects, a hybrid tracking model basedon the combination of KLT and Meanshift method is applied from step 618to 630.

As noted, the techniques described herein include object initialization.The motion detection results from forward/backward frame differencingcan contain some correct real moving objects and some false objects, andmiss some true objects. By way of example, for an UAV video with lowframe rate (for example, 1 frame/second), a moving object does not haveany overlapping regions between two consecutive frames so thattraditional methods for object initialization will not work. Toefficiently isolate promising moving objects among all detection resultsfor current frame, one or more embodiments of the invention includecombining a distance matrix with a similarity measure to initializemoving objects. The processing steps can include, for example, thefollowing.

A search radius is set, matching score threshold and minimum length oftracked history. The distance matrix between the objects (includingobject candidates) and all the blobs in the table is computed. If thelength of object trajectory is less than the preset value, aKernel-based algorithm is applied to find the match between the objectcandidate and blobs in terms of a preset matching score. Also, if theobject candidate appears in several consecutive frames, this candidatewill be initialized and stored on the object table. Otherwise, theobject candidate will be considered as a false object.

From the previous frame, one or more embodiments of the inventioninclude projecting the previous blob set into a current frame aftergeometrical registration. The motion of each object according to itsprevious position can be estimated by a KLT tracking process. In a KLTtracking process, a motion model is approximately represented by anaffine transformation, such that, I_(curr)(Ax+T)=I_(prev)(x), where Ais a two-dimensional (2D) transformation matrix and T is the translationvector.

In one or more embodiments of the invention, affine transformationparameters can be computed from as few as four feature points. Todetermine these parameters, a least squares technique can be used tocompute them.

Accuracy estimation can be performed, for example, when the number ofmismatched pairs occurs. One measure of tracking accuracy is the rootmean square error (RMSE) between the matched points before and after theaffine transformation formula. This measure is used as a criterion toeliminate the matches that are considered imprecise.

Additionally, to eliminate the outliers, one or more embodiments of theinvention includes performing the RANSAC algorithm to sequentiallyremove mis-matches in an iterative fashion until the RMSE value is lowerthan the desired threshold.

The techniques detailed herein additionally include meanshift trackingand object representation. By way of example, for a UAV tracking system,traditional intensity-based target representation is no longer suitablefor multi-object tracking due to large scale variation and perspectivegeometric distortion. To efficiently characterize the object,histogram-based feature space can be chosen. In one or more embodimentsof the invention, a metric based on the Bhattacharyya coefficient isused to define a similarity measure between a reference object and acandidate for multi-object tracking. Given an object region histogram qin the reference frame, the Bhattacharyya coefficient based objectivefunction is given by:

${\rho \left( {p,q} \right)} = {\sum\limits_{u = 1}^{M}{{p_{u}(x)}{q_{u}\left( x_{0} \right)}}}$

where M is the histogram dimension, and x₀ is the 2D center.

The candidate region histogram p_(u)(x) at 2D center x in the currentframe is defined as:

${p_{u}(x)} = \frac{\sum{{k\left( {\frac{x - x_{i}}{h}}^{2} \right)}{\delta \left( {{b\left( x_{i} \right)},u} \right)}}}{\sum{k\left( {\frac{x - x_{i}}{h}}^{2} \right)}}$

Here, u=1, 2, . . . , M. k(x) denotes a non-negative, non-increasing andpiecewise-differentiable kernel profile which weights the pixellocation, h is 2D bandwidth vector of k(x), δ is the Kronecker deltafunction and each pixel value is denoted by b(x_(i)).

Additionally, in one or more embodiments of the invention, indetermining a similarity measure between distributions, theBhattacharyya distance can include B(I_(x), I_(y))=√{square root over(1−ρ(p_(x), p_(y)))}, where ρ(p_(x), p_(y))=∫√{square root over({circumflex over (p)}_(x)(u){circumflex over (p)}_(y)(u))} du, andwhere ρ_(x) and p_(y) represent the target and the candidatedistributions, respectively.

The techniques described herein can additionally include objectpositioning. To search the location corresponding to the object from oneframe to the next, one or more embodiments of the invention includeapplying a meanshift tracking algorithm that is based on a gradientascent optimization rather than an exhaustive search. Strengths of themeanshift method include computational effectiveness and suitability toreal-time application. However, a target can be lost, for example, dueto an intrinsic limitation of exploring local maxima, especially whenthe tracked object moves quickly. The candidate region histogramp_(u)(x) can be obtained from the above equation.

The new location of the tracked object can be estimated as:

${\hat{y}}_{1} = \frac{\sum\limits_{i = 1}^{n}{X_{i}\omega_{i}{g\left( {\frac{{\hat{y}}_{0} - X_{i}}{h}}^{2} \right)}}}{\sum\limits_{i = 1}^{n}{\omega_{i}{g\left( {\frac{{\hat{y}}_{0}X_{i}}{h}}^{2} \right)}}}$

where:

$\omega_{1} = {\sum\limits_{u = 1}^{m}{{\delta \left\lbrack {{b\left( X_{i} \right)} - u} \right\rbrack}\sqrt{\frac{{\hat{q}}_{u}}{{\hat{p}}_{u}\left( {\hat{y}}_{0} \right)}}}}$

g(x)=−k(x), that the derivative of k(x).

One or more embodiments of the invention can also include target modelupdating on a temporal domain. In some circumstances, a meanshiftapproach without target model updating can suffer from abrupt changes intarget model. On the other hand, the model updating for every frame canresult in decreasing the reliability of the tracking results due tocluttered environment, occlusion, random noise, etc. One way to changethe target model is to periodically update the target distributions.

To obtain a precise tracking result, the target model can be updateddynamically. Accordingly, one or more embodiments of the inventioninclude model updating that use both recent tracking results and oldertarget model to impact a current target model for object tracking. Theupdating procedure is formulized as:

q _(u) ^(new)=(1−α)q _(u) ^(old) +α·p _(u) ^(s)

Here, the superscripts of new and old denote the newly obtained targetmodel and the old model, respectively. s represents the recent trackingresult. α weights the contribution of the recent tracking result(normally <0.1). q and p represent the target model and the candidatemodel, respectively.

Further, one or more embodiments of the invention include target modelupdating on a spatial domain. Normally, meanshift based tracking hardlyprovides precise boundary position of the tracked object due to lack ofutilizing spatial data. Fortunately, detection results derived from KLTtracker and motion detection results can provide much more accurateinformation, such as the precise position and object size compared withmeanshift tracker.

Each individual algorithm may unable to do a perfect job on multi-objecttracking. Thus, fusion among their data can be used in a multi-objecttracking procedure. According to the strengths of each method, one ormore embodiments of the invention use the following merging method:

${Output} = \left\{ \begin{matrix}{{{result}\mspace{14mu} {by}\mspace{14mu} {motion}\mspace{14mu} {detector}};} & {{{if}\mspace{14mu} {Overlapping}} \geq T} \\{{{KLT}\mspace{14mu} {result}};} & {{if}\mspace{14mu} {Outlier}\mspace{14mu} {for}\mspace{14mu} {MS}\mspace{14mu} {occurs}} \\{{{result}\mspace{14mu} {by}\mspace{14mu} {meanshift}};} & {otherwise}\end{matrix} \right.$

where overlapping represents the degree of overlapping region.

FIG. 7 is a diagram illustrating reference plane-based registration andtracking, according to an embodiment of the present invention. By way ofillustration, FIG. 7 depicts a geo-reference plane 702. The first frame704 is registered to geo-reference plane 702, and the second frame 706is registered to the geo-reference 702 from the first registered frameand corresponding inter-frame transformation parameters TC_(i) (equation712 in FIG. 7). In such fashion, frames 708 and 710 are registered tothe geo-reference 702, respectively. Moreover, each object is projectedinto geo-reference 702 using navigation data.

FIG. 8 is a flow diagram illustrating automatic urban road extraction,according to an embodiment of the present invention. Step 802 includesframing an image. Step 804 includes performing a Gaussian smoothingoperation. Also, step 806 includes using a canny detector and step 808includes implementing a hough transformation. Step 810 includesdetermining a maximum response finding. Step 812 includes determining ifthe length of the stripe is greater than a pre-defined threshold. If thelength of the stripe is not greater than the threshold, the processstops at step 814. If the length of the stripe is greater than thethreshold, the process continues to step 816, which includes performinga straight line extraction. Further, step 818 includes performing stripepixels removal (which can, for example, lead to a return to step 808).

As also depicted in FIG. 8, step 820 includes performing framedifferencing, and step 822 includes verification via motion historyimages (MHI) (which can, for example, lead to a return to step 816).Additionally, one or more embodiments of the invention can also includeextraction of road stripes via iterative hough transform.

As detailed herein, one or more embodiments of the invention includerecursive geometric registration with sub-pixel matching accuracy thatcan handle various geometrical residual errors from un-calibratedcamera. Additionally, the techniques detailed herein include motiondetection based on forward/backward frame differencing that canefficiently separate moving objects from background. Further, a hybridobject tracker can be implemented that uses colors, features andintensity statistical characteristics overtime to detect and trackmultiple small objects.

FIG. 9 is a block diagram illustrating architecture of an objectdetection and tracking system, according to an aspect of the invention.An example software architecture construction for a detection andtracking system (for example, a UAV system) can be built on multipleservices to provide a track database for object search and intelligentanalysis. As illustrated in FIG. 9, the software architecture caninclude multiple sensor modules 904, video streaming service modules906, tracking suite service modules 908, a track database (DB) servermodule 910, a user interface module 902 and a visualization console 912.A video streaming module 906 serves to capture and make availableimagery from multiple sensors. The acquired images are used by atracking suite module 908 as the basis for multi-object detection andtracking. Tracking suite modules 908 includes a geometric registrationsub-module 914, a motion extraction sub-module 916, an object trackingsub-module 918, a tracking data sub-module 920 and a geo-coordinatemapping sub-module 922.

By processing the real-time imagery from multiple sensors, sophisticatedtransformation of data to track information is achieved. Track DB server910 serves track metadata management. Visualization console 912 createsgraphical overlays, indexes them to the imagery on the display, andpresents them to a user. These overlays can be any type of graphicalinformation that supports the higher level components, such as, forexample, class types, moving directions, trajectories and object sizes.User interface 902 provides data access and operation by the user.

FIG. 10 is a flow diagram illustrating techniques for performing visualsurveillance of one or more moving objects, according to an embodimentof the present invention. Step 1002 includes registering one or moreimages captured by one or more cameras, wherein registering the one ormore images comprises region-based registration of the one or moreimages in two or more adjacent frames. This step can be carried out, forexample, using a geometric registration sub-module 914 in tracking suiteservice module 908. Registering images can include recursive global andlocal geometric registration of the one or more images (for example,region-based geometric transformation for handling lens geometricdistortion). Registering images can also include using sub-pixel imagematching techniques.

Step 1004 includes performing motion segmentation of the one or moreimages to detect one or more moving objects and one or more backgroundregions in the one or more images. This step can be carried out, forexample, using a motion extraction sub-module 916 in tracking suiteservice module 908. Performing motion segmentation of the images caninclude forward and backward frame differencing. Forward and backwardframe differences can include, for example, automatic dynamic thresholdestimation based on temporary filtering and/or spatial filtering,removing false moving pixels based on independent motions of imagefeatures, and performing a morphological operation and generating motionpixels.

Step 1006 includes tracking the one or more moving objects to facilitatevisual surveillance of the one or more moving objects. This step can becarried out, for example, using an object tracking sub-module 918 intracking suite service module 908. Tracking the moving objects caninclude performing hybrid target tracking, wherein hybrid targettracking includes using a Kanade-Lucas-Tomasi feature tracker andmeanshift, using auto kernel scale estimation and updating, and usingfeature trajectories. One or more embodiments of the invention can alsoinclude using colors for tracking. Tracking moving objects canadditionally include using multi-target tracking algorithms based onfeature matching and distance matrices for one or more (small) targets.

Also, tracking moving objects can include generating a motion map,identifying one or more moving objects (blobs), performing objectinitialization and object checking, identifying object regions in themotion map, extracting features, setting a search region in the motionmap, identifying candidate regions in the motion map, meanshifttracking, identifying moving objects in the candidate regions,performing Kanade-Lucas-Tomasi feature matching, performing an affinetransform (with RANSAC), making a final regions determination via theBhattacharyya coefficient, and updating a target model and trajectoryinformation. Tracking moving objects can additionally include referenceplane-based registration and tracking.

The techniques depicted in FIG. 10 can also include relating each cameraview with one or more other camera views, and forming a panoramic viewfrom the images captured by one or more cameras. One or more embodimentsof the invention additionally include estimating motion of each camerabased on video information of static objects in the panoramic view, aswell as estimating one or more background (for example, road) structuresin the panoramic view based on linear structure detection andstatistical analysis of the moving objects over a period of time.

Further, the techniques depicted in FIG. 10 include automatic feature(for example, a road) extraction, wherein automatic feature extractionincludes framing an image, performing a Gaussian smoothing operation,using a canny detector to extract one or more feature (for example,road) edges, implementing a hough transformation for feature (forexample, road stripe) analysis, determining a maximum response findingfor reducing an influence of multiple peaks in a transform space,determining if a length of a feature (for example, a road stripe) isgreater than a certain threshold, and if the length of the feature isgreater than the threshold, performing feature extraction and pixelremoval. Automatic feature extraction can additionally includeperforming frame differencing and verification via motion historyimages.

One or more embodiments of the invention also include performing outlierremoval to remove incorrect moving object matches (and improve theregistration precision). The techniques depicted in FIG. 10 canadditionally include false blob filtering. False blob filtering includesgenerating a motion map, applying a connected component process to linkeach blob data, creating a motion blob table, extracting features foreach blob in a previously registered frame, and applying aKanade-Lucas-Tomasi method to estimate motion of each blob, and, if nomotion occurs for a blob, deleting the blob from the blob table.

Additionally, one or more embodiments of the invention can includeupdating a target model on a temporal domain and/or a spatial domain, aswell as creating an index (for example, a searchable index) of objectappearances and object tracks in a panoramic view. Also, the objectappearance and tracks template index can be stored in a template datastore with a pointer to the corresponding video segments for easyretrieval. Further, one or more embodiments of the invention can includedetermining a similarity metric between a query and an entry in theindex, which can facilitate searching for the object appearance andtracks in a template data store/index based on the similarity metric,and outputting/listing the search results for a human operator based onsimilarity of the query.

The techniques depicted in FIG. 10 can also, as described herein,include providing a system, wherein the system includes distinctsoftware modules, each of the distinct software modules being embodiedon a tangible computer-readable recordable storage medium. All themodules (or any subset thereof) can be on the same medium, or each canbe on a different medium, for example. The modules can include any orall of the components shown in the figures. In one or more embodiments,the modules include sensor modules, video streaming service modules,tracking suite service modules (including the sub-modules detailedherein), a track database (DB) server module, a user interface moduleand a visualization console module that can run, for example on one ormore hardware processors. The method steps can then be carried out usingthe distinct software modules of the system, as described above,executing on the one or more hardware processors. Further, a computerprogram product can include a tangible computer-readable recordablestorage medium with code adapted to be executed to carry out one or moremethod steps described herein, including the provision of the systemwith the distinct software modules.

Additionally, the techniques depicted in FIG. 10 can be implemented viaa computer program product that can include computer useable programcode that is stored in a computer readable storage medium in a dataprocessing system, and wherein the computer useable program code wasdownloaded over a network from a remote data processing system. Also, inone or more embodiments of the invention, the computer program productcan include computer useable program code that is stored in a computerreadable storage medium in a server data processing system, and whereinthe computer useable program code are downloaded over a network to aremote data processing system for use in a computer readable storagemedium with the remote system.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon.

One or more embodiments of the invention, or elements thereof, can beimplemented in the form of an apparatus including a memory and at leastone processor that is coupled to the memory and operative to performexemplary method steps.

One or more embodiments can make use of software running on a generalpurpose computer or workstation. With reference to FIG. 11, such animplementation might employ, for example, a processor 1102, a memory1104, and an input/output interface formed, for example, by a display1106 and a keyboard 1108. The term “processor” as used herein isintended to include any processing device, such as, for example, onethat includes a CPU (central processing unit) and/or other forms ofprocessing circuitry. Further, the term “processor” may refer to morethan one individual processor. The term “memory” is intended to includememory associated with a processor or CPU, such as, for example, RAM(random access memory), ROM (read only memory), a fixed memory device(for example, hard drive), a removable memory device (for example,diskette), a flash memory and the like. In addition, the phrase“input/output interface” as used herein, is intended to include, forexample, one or more mechanisms for inputting data to the processingunit (for example, mouse), and one or more mechanisms for providingresults associated with the processing unit (for example, printer). Theprocessor 1102, memory 1104, and input/output interface such as display1106 and keyboard 1108 can be interconnected, for example, via bus 1110as part of a data processing unit 1112. Suitable interconnections, forexample via bus 1110, can also be provided to a network interface 1114,such as a network card, which can be provided to interface with acomputer network, and to a media interface 1116, such as a diskette orCD-ROM drive, which can be provided to interface with media 1118.

Accordingly, computer software including instructions or code forperforming the methodologies of the invention, as described herein, maybe stored in one or more of the associated memory devices (for example,ROM, fixed or removable memory) and, when ready to be utilized, loadedin part or in whole (for example, into RAM) and implemented by a CPU.Such software could include, but is not limited to, firmware, residentsoftware, microcode, and the like.

A data processing system suitable for storing and/or executing programcode will include at least one processor 1102 coupled directly orindirectly to memory elements 1104 through a system bus 1110. The memoryelements can include local memory employed during actual implementationof the program code, bulk storage, and cache memories which providetemporary storage of at least some program code in order to reduce thenumber of times code must be retrieved from bulk storage duringimplementation.

Input/output or I/O devices (including but not limited to keyboards1108, displays 1106, pointing devices, and the like) can be coupled tothe system either directly (such as via bus 1110) or through interveningI/O controllers (omitted for clarity).

Network adapters such as network interface 1114 may also be coupled tothe system to enable the data processing system to become coupled toother data processing systems or remote printers or storage devicesthrough intervening private or public networks. Modems, cable modem andEthernet cards are just a few of the currently available types ofnetwork adapters.

As used herein, including the claims, a “server” includes a physicaldata processing system (for example, system 1112 as shown in FIG. 11)running a server program. It will be understood that such a physicalserver may or may not include a display and keyboard.

As noted, aspects of the present invention may take the form of acomputer program product embodied in one or more computer readablemedium(s) having computer readable program code embodied thereon. Anycombination of one or more computer readable medium(s) may be utilized.The computer readable medium may be a computer readable signal medium ora computer readable storage medium. A computer readable storage mediummay be, for example, but not limited to, an electronic, magnetic,optical, electromagnetic, infrared, or semiconductor system, apparatus,or device, or any suitable combination of the foregoing. Media block1118 is a non-limiting example. More specific examples (a non-exhaustivelist) of the computer readable storage medium would include thefollowing: an electrical connection having one or more wires, a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), an optical fiber, a portable compact disc read-onlymemory (CD-ROM), an optical storage device, a magnetic storage device,or any suitable combination of the foregoing. In the context of thisdocument, a computer readable storage medium may be any tangible mediumthat can contain, or store a program for use by or in connection with aninstruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, radio frequency (RF), etc., or anysuitable combination of the foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, component, segment,or portion of code, which comprises one or more executable instructionsfor implementing the specified logical function(s). It should also benoted that, in some alternative implementations, the functions noted inthe block may occur out of the order noted in the figures. For example,two blocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

It should be noted that any of the methods described herein can includean additional step of providing a system comprising distinct softwaremodules embodied on a computer readable storage medium; the modules caninclude, for example, any or all of the components shown in FIG. 9. Themethod steps can then be carried out using the distinct software modulesand/or sub-modules of the system, as described above, executing on oneor more hardware processors 1102. Further, a computer program productcan include a computer-readable storage medium with code adapted to beimplemented to carry out one or more method steps described herein,including the provision of the system with the distinct softwaremodules.

In any case, it should be understood that the components illustratedherein may be implemented in various forms of hardware, software, orcombinations thereof; for example, application specific integratedcircuit(s) (ASICS), functional circuitry, one or more appropriatelyprogrammed general purpose digital computers with associated memory, andthe like. Given the teachings of the invention provided herein, one ofordinary skill in the related art will be able to contemplate otherimplementations of the components of the invention.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a,” “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present invention has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the invention. Theembodiment was chosen and described in order to best explain theprinciples of the invention and the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

At least one embodiment of the invention may provide one or morebeneficial effects, such as, for example, automatic dynamic thresholddetermination based on temporary and/or spatial domain.

It will be appreciated and should be understood that the exemplaryembodiments of the invention described above can be implemented in anumber of different fashions. Given the teachings of the inventionprovided herein, one of ordinary skill in the related art will be ableto contemplate other implementations of the invention. Indeed, althoughillustrative embodiments of the present invention have been describedherein with reference to the accompanying drawings, it is to beunderstood that the invention is not limited to those preciseembodiments, and that various other changes and modifications may bemade by one skilled in the art.

1. A method for performing visual surveillance of one or more movingobjects, wherein the method comprises: registering one or more imagescaptured by one or more cameras, wherein registering the one or moreimages comprises region-based registration of the one or more images intwo or more adjacent frames; performing motion segmentation of the oneor more images to detect one or more moving objects and one or morebackground regions in the one or more images; and tracking the one ormore moving objects to facilitate visual surveillance of the one or moremoving objects.
 2. The method of claim 1, wherein registering one ormore images comprises recursive global and local geometric registrationof the one or more images.
 3. The method of claim 1, wherein registeringone or more images comprises using one or more sub-pixel image matchingtechniques.
 4. The method of claim 1, wherein performing motionsegmentation of the one or more images comprises forward and backwardframe differencing.
 5. The method of claim 4, wherein forward andbackward frame differences comprises automatic dynamic thresholdestimation based on at least one of temporary filtering and spatialfiltering.
 6. The method of claim 4, wherein forward and backward framedifferences comprises removing one or more false moving pixels based onindependent motions of one or more image features.
 7. The method ofclaim 4, wherein forward and backward frame differences comprisesperforming a morphological operation and generating one or more motionpixels.
 8. The method of claim 1, wherein tracking the one or moremoving objects comprises performing hybrid target tracking, whereinhybrid target tracking comprises using a Kanade-Lucas-Tomasi featuretracker and meanshift, using auto kernel scale estimation and updating,and using one or more feature trajectories.
 9. The method of claim 1,wherein tracking the one or more moving objects comprises using one ormore multi-target tracking algorithms based on feature matching anddistance matrices for one or more targets.
 10. The method of claim 1,wherein tracking the one or more moving objects comprises: generating amotion map; identifying one or more moving objects; performing objectinitialization and object checking; identifying one or more objectregions in the motion map; extracting one or more features; setting asearch region in the motion map; identifying one or more candidateregions in the motion map; meanshift tracking; identifying one or moremoving objects in the one or more candidate regions; performingKanade-Lucas-Tomasi feature matching; performing an affine transform;making a final regions determination via the Bhattacharyya coefficient;and updating a target model and trajectory information.
 11. The methodof claim 1, wherein tracking the one or more moving objects comprisesreference plane-based registration and tracking.
 12. The method of claim1, further comprising relating each camera view with one or more othercamera views.
 13. The method of claim 1, further comprising forming apanoramic view from the one or more images captured by one or morecameras.
 14. The method of claim 13, further comprising estimatingmotion of each camera based on video information of one or more staticobjects in the panoramic view.
 15. The method of claim 13, furthercomprising estimating one or more background structures in the panoramicview based on linear structure detection and statistical analysis of theone or more moving objects over a period of time.
 16. The method ofclaim 1, further comprising automatic feature extraction, whereinautomatic feature extraction comprises: framing an image; performing aGaussian smoothing operation; using a canny detector to extract one ormore feature edges; implementing a hough transformation for featureanalysis; determining a maximum response finding for reducing aninfluence of multiple peaks in a transform space; determining if alength of a feature is greater than a certain threshold, and if thelength of the feature is greater than the threshold, performing featureextraction and pixel removal.
 17. The method of claim 16, whereinautomatic feature extraction further comprises performing framedifferencing and verification via motion history images.
 18. The methodof claim 1, further comprising performing outlier removal to remove oneor more incorrect moving object matches.
 19. The method of claim 1,further comprising false blob filtering, wherein false blob filteringcomprises: generating a motion map; applying a connected componentprocess to link each blob data; creating a motion blob table; extractingone or more features for each blob in a previously registered frame; andapplying a Kanade-Lucas-Tomasi method to estimate motion of each blob,and, if no motion occurs for a blob, deleting the blob from the blobtable.
 20. The method of claim 1, further comprising updating a targetmodel on at least one of a temporal domain and a spatial domain.
 21. Themethod of claim 1, further comprising creating an index of objectappearances and object tracks in a panoramic view.
 22. The method ofclaim 21, further comprising determining a similarity metric between aquery and an entry in the index.
 23. The method of claim 1, furthercomprising providing a system, wherein the system comprises one or moredistinct software modules, each of the one or more distinct softwaremodules being embodied on a tangible computer-readable recordablestorage medium, and wherein the one or more distinct software modulescomprise a geometric registration module, a motion extraction module andan object tracking module executing on a hardware processor.